Image Titles - Variations on Show, Attend and Tell

نویسندگان

  • Vincent Sitzmann
  • Timon Ruban
  • Robert Konrad
چکیده

Inspired by recent advances in machine translation and object detection, we implement an image captioning pipeline, consisting of a Fully Convolutional Neural Network piping image features into an image-captioning LSTM, based on the popular Show, Attend, and Tell model. We implement the model in TensorFlow and recreate performance metrics reported in the paper. We identify and experiment with variations on the model, and evaluate them via a series of experiments on the MS COCO benchmark dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to a...

متن کامل

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Modern neural image captioning systems typically adopt the encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for caption generation. Inspired by the robustness analysis of CNN-based image classifiers to adversarial perturbations, we propose Show-and-Fool, a novel algorithm for ...

متن کامل

Uprising in “Uprising”: A Multimodal Analysis of Bob Marley’s Lyrics

This paper investigates how the theme of uprising is conveyed in Bob Marley’s final music album by the name “Uprising”. Through the methodological lenses of multimodality, attention is focused on how the album cover design, lexical items, literary devices, and other aesthetic ways such as the titles of the ten songs of the album and their order of arrangement contribute to the overall theme of ...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Statistical Review of Theses in Iran Public Universities in Medical Imaging Field

Introduction: In spite of all the researches on medical imaging, this field is still hot. It is the third major area of research, conducted in the world. Unfortunately, there is no statistical evaluation available of the student theses in Iran. It does not seem like a policy is going to be established to address this issue. Providing a statistical insight to activities done in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017